Global pooling is one of the most significant operations in many machine learning models and tasks, which works for information fusion and structured data (like sets and graphs) representation. However, without solid mathematical fundamentals, its practical implementations often depend on empirical mechanisms and thus lead to sub-optimal, even unsatisfactory performance. In this work, we develop a novel and generalized global pooling framework through the lens of optimal transport. The proposed framework is interpretable from the perspective of expectation-maximization. Essentially, it aims at learning an optimal transport across sample indices and feature dimensions, making the corresponding pooling operation maximize the conditional expectation of input data. We demonstrate that most existing pooling methods are equivalent to solving a regularized optimal transport (ROT) problem with different specializations, and more sophisticated pooling operations can be implemented by hierarchically solving multiple ROT problems. Making the parameters of the ROT problem learnable, we develop a family of regularized optimal transport pooling (ROTP) layers. We implement the ROTP layers as a new kind of deep implicit layer. Their model architectures correspond to different optimization algorithms. We test our ROTP layers in several representative set-level machine learning scenarios, including multi-instance learning (MIL), graph classification, graph set representation, and image classification. Experimental results show that applying our ROTP layers can reduce the difficulty of the design and selection of global pooling -- our ROTP layers may either imitate some existing global pooling methods or lead to some new pooling layers fitting data better. The code is available at \url{https://github.com/SDS-Lab/ROT-Pooling}.
translated by 谷歌翻译
While many systems have been developed to train Graph Neural Networks (GNNs), efficient model inference and evaluation remain to be addressed. For instance, using the widely adopted node-wise approach, model evaluation can account for up to 94% of the time in the end-to-end training process due to neighbor explosion, which means that a node accesses its multi-hop neighbors. On the other hand, layer-wise inference avoids the neighbor explosion problem by conducting inference layer by layer such that the nodes only need their one-hop neighbors in each layer. However, implementing layer-wise inference requires substantial engineering efforts because users need to manually decompose a GNN model into layers for computation and split workload into batches to fit into device memory. In this paper, we develop Deep Graph Inference (DGI) -- a system for easy and efficient GNN model inference, which automatically translates the training code of a GNN model for layer-wise execution. DGI is general for various GNN models and different kinds of inference requests, and supports out-of-core execution on large graphs that cannot fit in CPU memory. Experimental results show that DGI consistently outperforms layer-wise inference across different datasets and hardware settings, and the speedup can be over 1,000x.
translated by 谷歌翻译
图形神经网络(GNN)是具有无核数据的应用的有前途的方法。但是,具有数亿节点的大规模图上的培训GNN既是资源又是耗时的。与DNN不同,GNN通常具有更大的内存足迹,因此GPU内存能力和PCIE带宽是GNN培训中的主要资源瓶颈。为了解决此问题,我们提出分叉:一种图形量化方法,通过显着减少内存足迹和PCIE带宽要求来加速GNN训练,以便GNN可以充分利用GPU计算功能。我们的关键见解是,与DNN不同,GNN不太容易发生量化引起的输入特征的信息丢失。我们确定图形特征量化中的主要准确性影响因素,从理论上证明,分叉训练会收敛到网络,在该网络中,损失在未压缩网络的最佳损失的$ \ epsilon $之内。我们使用几种流行的GNN模型和数据集对分叉进行了广泛的评估,包括最大的公共图数据集MAG240M上的图形。结果表明,分叉达到30以上的压缩率,并在边际准确性损失的情况下提高了GNN训练速度200%-320%。特别是,分叉在一小时内仅使用四个GPU在MAG240M上的训练图来实现记录。
translated by 谷歌翻译
静电执行器为创建软机器人板提供了一种有希望的方法,因为它们的柔性外形,模块化集成和快速响应速度。但是,它们的控制需要千伏信号,并理解由板上和环境效应的力相互作用引起的复杂动力学。在这项工作中,我们演示了一个不受限制的二维五实机压电机器人,该机器人由电池和板载高压电路提供动力,并通过无线链路进行控制。可扩展的制造方法基于彼此之间的键合化层(钢箔底物,执行器,柔性电子设备)。机器人表现出一系列可控运动,包括双向爬行(高达〜0.6 cm/s),转弯和现场旋转(约1度/s)。高速视频和控制实验表明,运动的丰富性是由于机器人中不对称质量分布的相互作用以及动力学对压电驱动频率的相关依赖性。
translated by 谷歌翻译
全球合并是许多机器学习模型和任务中最重要的操作之一,但是在实践中,其实施通常是经验的。在这项研究中,我们通过最佳运输镜头开发了一个新颖而坚实的全球合并框架。我们证明,大多数现有的全球合并方法等同于解决不平衡最佳运输(UOT)问题的一些专业。使UOT问题的参数可学习,我们在同一框架中统一了各种全局合并方法,因此,为神经网络提出了一个称为UOT-Pooling(UOTP)的广义全局池层。除了基于经典的Sinkhorn尺度算法实现UOTP层外,我们设计了一种基于Bregman ADMM算法的新模型体系结构,该体系结构具有更好的数值稳定性,并且可以更有效地重现现有的池化层。我们在几种应用程序方案中测试了UOTP层,包括多构度学习,图形分类和图像分类。我们的UOTP层可以模仿常规的全球合并层,也可以学习一些新的合并机制,从而提高性能。
translated by 谷歌翻译
电驱动的软机器人能够实现小型和灯体,以及环境兼容性,各种运动和安全操作。特别地,静电致动器(例如,压电致动器)快速响应。但是,可扩展的无缝集成和不可阻止操作的方法仍不清楚。此外,软体自然建模,包括环境互动,是一个长期存在的挑战。此外,需要探索更多的机器机制。在本文中,我们设计了模型,建模并展示了一个软机器人,这是第一次开始解决所有这些问题。它具有平面结构的五个执行器的线性阵列,用于集成和自由操作的开门。通过依靠姿势自我调整,设计和验证了一种新的九寸式捕获的爬行运动机制。通过实验开发并验证了包括井解释机器人运动的压电,重力和地面相互作用的第一分析软体模型。我们展示了机器人的前向和向后运动,并探索了有效载荷和驾驶速度的影响:每循环的1.2 mm运动,在移动时可以携带高达200克的有效载荷(16倍体重)。这项工作为复杂的未知环境中的快速响应机器人铺平了道路。
translated by 谷歌翻译
本文回顾了关于压缩视频质量增强质量的第一个NTIRE挑战,重点是拟议的方法和结果。在此挑战中,采用了新的大型不同视频(LDV)数据集。挑战有三个曲目。Track 1和2的目标是增强HEVC在固定QP上压缩的视频,而Track 3旨在增强X265压缩的视频,以固定的位速率压缩。此外,轨道1和3的质量提高了提高保真度(PSNR)的目标,以及提高感知质量的2个目标。这三个曲目完全吸引了482个注册。在测试阶段,分别提交了12个团队,8支球队和11支球队,分别提交了轨道1、2和3的最终结果。拟议的方法和解决方案衡量视频质量增强的最先进。挑战的首页:https://github.com/renyang-home/ntire21_venh
translated by 谷歌翻译
Panoptic Part Segmentation (PPS) unifies panoptic segmentation and part segmentation into one task. Previous works utilize separated approaches to handle thing, stuff, and part predictions without shared computation and task association. We aim to unify these tasks at the architectural level, designing the first end-to-end unified framework named Panoptic-PartFormer. Moreover, we find the previous metric PartPQ biases to PQ. To handle both issues, we make the following contributions: Firstly, we design a meta-architecture that decouples part feature and things/stuff feature, respectively. We model things, stuff, and parts as object queries and directly learn to optimize all three forms of prediction as a unified mask prediction and classification problem. We term our model as Panoptic-PartFormer. Secondly, we propose a new metric Part-Whole Quality (PWQ) to better measure such task from both pixel-region and part-whole perspectives. It can also decouple the error for part segmentation and panoptic segmentation. Thirdly, inspired by Mask2Former, based on our meta-architecture, we propose Panoptic-PartFormer++ and design a new part-whole cross attention scheme to further boost part segmentation qualities. We design a new part-whole interaction method using masked cross attention. Finally, the extensive ablation studies and analysis demonstrate the effectiveness of both Panoptic-PartFormer and Panoptic-PartFormer++. Compared with previous Panoptic-PartFormer, our Panoptic-PartFormer++ achieves 2% PartPQ and 3% PWQ improvements on the Cityscapes PPS dataset and 5% PartPQ on the Pascal Context PPS dataset. On both datasets, Panoptic-PartFormer++ achieves new state-of-the-art results with a significant cost drop of 70% on GFlops and 50% on parameters. Our models can serve as a strong baseline and aid future research in PPS. Code will be available.
translated by 谷歌翻译
Rankings are widely collected in various real-life scenarios, leading to the leakage of personal information such as users' preferences on videos or news. To protect rankings, existing works mainly develop privacy protection on a single ranking within a set of ranking or pairwise comparisons of a ranking under the $\epsilon$-differential privacy. This paper proposes a novel notion called $\epsilon$-ranking differential privacy for protecting ranks. We establish the connection between the Mallows model (Mallows, 1957) and the proposed $\epsilon$-ranking differential privacy. This allows us to develop a multistage ranking algorithm to generate synthetic rankings while satisfying the developed $\epsilon$-ranking differential privacy. Theoretical results regarding the utility of synthetic rankings in the downstream tasks, including the inference attack and the personalized ranking tasks, are established. For the inference attack, we quantify how $\epsilon$ affects the estimation of the true ranking based on synthetic rankings. For the personalized ranking task, we consider varying privacy preferences among users and quantify how their privacy preferences affect the consistency in estimating the optimal ranking function. Extensive numerical experiments are carried out to verify the theoretical results and demonstrate the effectiveness of the proposed synthetic ranking algorithm.
translated by 谷歌翻译
In this work, we focus on instance-level open vocabulary segmentation, intending to expand a segmenter for instance-wise novel categories without mask annotations. We investigate a simple yet effective framework with the help of image captions, focusing on exploiting thousands of object nouns in captions to discover instances of novel classes. Rather than adopting pretrained caption models or using massive caption datasets with complex pipelines, we propose an end-to-end solution from two aspects: caption grounding and caption generation. In particular, we devise a joint Caption Grounding and Generation (CGG) framework based on a Mask Transformer baseline. The framework has a novel grounding loss that performs explicit and implicit multi-modal feature alignments. We further design a lightweight caption generation head to allow for additional caption supervision. We find that grounding and generation complement each other, significantly enhancing the segmentation performance for novel categories. We conduct extensive experiments on the COCO dataset with two settings: Open Vocabulary Instance Segmentation (OVIS) and Open Set Panoptic Segmentation (OSPS). The results demonstrate the superiority of our CGG framework over previous OVIS methods, achieving a large improvement of 6.8% mAP on novel classes without extra caption data. Our method also achieves over 15% PQ improvements for novel classes on the OSPS benchmark under various settings.
translated by 谷歌翻译